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ABSTRACT 



R. J. Owen (1975) proposed an approximate empirical Bayes 
procedure for item selection in adaptive testing. The procedure replaces the 
true posterior by a normal approximation with closed- form expressions for its 
first two moments. This approximation was necessary to minimize the 
computational complexity, involved in a fully Bayesian approach, but is no 
longer necessary given the computational power currently available in 
adaptive testing. This paper suggests several item selection criteria for 
adaptive testing that are all based on the use of the true posterior. Some of 
the statistical properties of the ability estimator produced by these 
criteria are discussed and empirically characterized. An empirical study with 
300 test items showed that the maximum predicted posterior expected 
information criterion had excellent mean- squared error for more extreme 
values of theta, and is the criterion elect for application in short adaptive 
tests. An appendix presents Owen's equations. (Contains 17 references.) 
(Author/SLD) 
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Abstract 

Owen (1975) proposed an approximate empirical Bayes procedure for item 
selection in adaptive testing. The procedure replaces the true posterior by a 
normal approximation with closed-form expressions for its first two moments. This 
approximation was necessary to minimize the computational complexity involved in 
a fully Bayesian approach but is no longer necessary given the computational 
power currently available in adaptive testing. This paper suggests several item 
selection criteria for adaptive testing which are all based on the use of the true 
posterior. Some of the statistical properties of the ability estimator produced by 
these criteria are discussed and empirically characterized. 
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Introduction 

Adaptive testing is based on the principle of selecting items to match the current 
estimate of the ability of the examinee. An important choice is how to translate this 
principle into a formal criterion of item selection implementable as a computer 
algorithm. Since the early days of adaptive testing, two item selection criteria have 
been popular: the maximum-information criterion and an approximate Bayesian 
criterion proposed by Owen (1975). 

It is the purpose of this paper to introduce several new criteria for item 
selection in adaptive testing which are all Bayesian in the sense that they are 
based on the posterior distribution of the ability of the examinee. The criteria can 
be used as an alternative to Owen’s criterion which is based on an approximate 
empirical Bayes approach to adaptive testing. The approximation was introduced 
at a time when the numerical complexity involved in fully Bayesian approach was a 
practical problem. However, for the computers currently in use in adaptive testing 
programs, this complexity is no longer a problem. 

The paper is. organized as follows: First, the maximum-information and Owen's 
criterion are reviewed. Subsequently, several Bayesian criteria for item selection 
are introduced. Next, some statistical properties of the final ability estimators for an 
adaptive test based on these criteria are discussed. The last section of the paper 
presents results from a simulation study run to characterize the properties of these 
estimators empirically. 

Model 

The two-parameter logistic model will be used as the response model under 
which the items in the pool have satisfactory fit. However, the results obtained in 
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this paper easily generalize to any (unidimensional) item response theory (IRT) 
model. To introduce the model, a random response variable Uj is defined to 
denote a correct (Uj=1) or an incorrect (Uj=0) response to item i. The model is 
given by the following equation for the probability of success on item i for an 
examinee with (fixed) ability 0e ( —OOjOoj * 



Pi(0) = Prob{U j= 1 10) 



exp[aj(0-bj)] 

1+exp[aj(0-bj)] 



( 1 ) 



Location parameter bje(-<»,c») and scale parameter aje[0,°°) in this model are 
commonly interpreted as the difficulty and discriminating power of item i, 
respectively. 



Maximum-Information Criterion 

To present the maximum information criterion, the following notation is needed. 
The items in the pool are denoted by i=1,...,l, For convenience, an adaptive test of 
fixed length n will be assumed. The rank of the items in the test is denoted by 

index k=1 n. It follows that i k is the index of the item in the pool administered as 

the kth item in the test. Suppose k-1 items have already been selected. The 

indices of these items form the set s{i, L A The remaining set of items in 

the pool is denoted as R k ={1 l}\S k _ 1 . 

For responses Uj 1 =Uj 1 Uj k 1= Uj k obtained on the first k-1 items, the 

likelihood function is 



k-1 



L(©|uj 1 uj k1 ) = n 



{exp[aj.(0-bj.)]} J 
Uexp[aj.(0-bi.)] ' 



( 2 ) 



An ML estimator (MLE) of 0 based on these responses is a maximizer of (2) over 
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0, that is, 



Vi );8s( ~ , " )1 - 



( 3 ) 



Fisher’s information about the unknown value of 0 in the response variables 
associated with the k-1 items is defined as: 

. k-1 (P|'(0)) 2 

l u . m. (0) ■ — E( — _ — ,) = I 1 (4) 

U| 1 U| k -1 [ de 2 M '1 'k-1' j= i pjj(e)(i -pj.(e)) 



where 



Pi .'(0) = _L Pi .(0) 

V 00 'j 

and the last step in (4) is a well-known result for the model in (1) (see, for 
example, Hambleton & Swaminathan, 1985, sect. 6.3). Note that (4) is additive 
because of conditional independence of the response variables given 0. 

The maximum-information criterion common in adaptive testing selects the kth 
item such that maximum information is obtained at 0=0 M . , i.e., 

l k = maxjflui, U^.U/S, Vl >;Mk>- (5) 

Because the information measure is additive, the criterion is equivalent to 

i k = maxj {l Uj (0 Uji uj k1 ):i e R k>- 
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Owen’s Criterion 

As an alternative to the maximum-information criterion, Owen (1975) proposed 
an approximate empirical Bayes procedure for adaptive testing based on the 
following three-parameter normal-ogive model: 



where <J>(.) is the normal distribution function and Cj a lower asymptote to model 
the probability of guessing item i correctly. 

For a vector of responses to the first k-1 items, the likelihood function is given 
by (2) with the logistic factor replaced by the normal ogive. Assuming a prior g(0), 
the following expression for the posterior distribution of 0 after k-1 items is 
obtained: 



Owen’s procedure is based on (8) as an updating procedure for the posterior with 
the choice of a normal density for the prior g(0). Item k is chosen to satisfy 



for a small value of 5, where E(0 |u ,...,Uj^ ^ is the expectation of 0 over (8) 
now generally known as the Expected A Posteriori (EAP) estimator of 0. The 
procedure is stopped as soon as the variance of (8) is smaller than a prespecified 
threshold value. 

It should be noted that the likelihood in (8) does not have the normal family as 



Pi(0) = Cj+(1-Cj)<&[aj(0-bi)], 




L(e |u | 1 Uj k 1 g(e) 




|bj k -E(0|u il Uj k1 )| < 5 
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class of conjugate distributions. Therefore, if a normal prior is chosen, the posterior 
is not normal, and repeated updating of the posterior using (8) soon leads to a 
posterior that could not be calculated in applications to real-time adaptive testing 
by the computers available in the 70s. Owen therefore proposed to replace the 
true posterior by a normal approximation with the same expected value and 
variance and presented closed-form expressions for the update of a normalized 
posterior (see Appendix 1). The approximation was motivated by showing that for 

E(9 |u j 1 Uj^) goes to the 

1975, Theorem 2). 

Owen also referred to the criterion of minimization of the preposterior risk 
under a quadratic loss function as an alternative to (9). This criterion selects the 
item that minimizes the expected posterior variance. Computationally it is more 
involved than the criterion in (9) in combination with a normal approximation to the 
posterior, and for this reason the latter became widely popular as Owen’s 
procedure of adaptive testing. An extensive simulation study of the statistical 
properties of Owen’s procedure is reported in Weiss and McBride (1984). A 
generalization of the procedure to the case of multidimensional adaptive testing is 
discussed in Bloxom and Vale (1987). The criterion of minimum expected posterior 
variance will be returned to later in this paper. 

Bayesian Criteria 

A Bayesian approach to adaptive testing is loosely defined as any approach 
which uses a prior or posterior distribution to define rules for: (1) selecting the first 
item; (2) estimating 0; (3) selecting the next item; or (4) stopping the test. 
According to this convention, the use of an informative prior to select the first item 



mild bounds on 5 the estimator 0 U . ... s 

U i1 \_i 

true value of 0 in mean square for k -» °° (Owen, 
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in an adaptive testing procedure is thus an example of a Bayesian adaptive testing 
procedure (van der Linden, 1996). Owen’s procedure is adaptive in that the item 
selection criterion in (9) is based on the mean of the (approximate) posterior and 
the posterior variance is used to stop the test. However, his procedure does not 
base item selection on the full posterior which, in a Bayesian framework, is the 
best reflection of the uncertainty in the current ability estimate. 

This section introduces several alternative criteria for item selection based on 
the full posterior. The first two criteria generalize the idea of maximum information 
in a Bayesian fashion. The next criterion is the one of minimum expected posterior 
variance also discussed in Owen (1975). The fourth criterion combines the ideas of 
posterior weighing and preposterior prediction underlying the first two criteria into a 
new one. Finally, some other Bayesian procedures of item selection are alluded to. 

Maximum Posterior Expected Information 

The first criterion reformulates the maximum information criterion in a Bayesian 
fashion by first choosing the appropriate information measure and then taking its 
expectation across the posterior distribution. 

If the kth item is selected, responses to the first k-1 items are already known. 
Hence, these data can no longer be presented by random variables but only by 
the (fixed) values of their realizations. As a consequence, Fisher’s information, 
defined as an expected value across random data, is no longer a valid measure. A 
typical Bayesian choice is to use the observed information measure 

\ v, (e) " :~2 lnL(9|U| i Vi> (,0) 

which reflects the relative curvature of the observed likelihood function at the value 



O 
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of 0. 



Though the distinction between the two information measures is important, it is 
easy to show that, under the model in (1), the second derivative in the right-hand 
side of (7) is the same for each possible response vector (for a derivation, see 
Veerkamp, 1996). Therefore, it holds for this model that 



However, to obtain generality, the distinction between the two information 
measures will maintained. 

The proposal is to select the next item to minimize the expected value of the 
observed information J(0) over the posterior distribution of 0. The index of the kth 
item according to this criterion is: 



where g(0 |uj 1 ,...,Uj k ,) is the posterior update obtained from (8). 

The criterion is a generalization of the likelihood weighted information criterion 
introduced in Luecht (1995) and Veerkamp and Berger (in press). The advantage 
of using the posterior for weighing the information is the possibility to incorporate 
prior knowledge about 0 in the item selection procedure. Use of this possibility is 
recommended when data on background variables with a statistical relation to0 
are available (van der Linden, 1996). 




( 11 ) 



i k = maxj 



j {/ J Uj(e)g( e I 



u i 1 ,"., u i k _ i ;je R k } 



( 12 ) 
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Maximum Predicted Expected Information 

The following criterion predicts the probability distribution of the responses of 
the examinee on each item je and selects the item with maximum expected 
information over this probability distribution. More in particular, for each item jeR^ 
the distribution of Uj is given by the probabilities {pj( 9 ), 1-Pj(0)}. The best 



prediction of these probabilities is their evaluation at the current estimate 
0 m. .To maintain the Bayesian framework, 0 M . is chosen to 

’ u i^ u i| ( _i 

be ihe maximum a posteriori (MAP) or expected a posterior (EAP) estimator 
throughout this paper. If at the next stage item j would be chosen and response 
U 0 obtained, the new estimate of 0 would be 0 M . „. n._n and observed 

information would be equal to u . y._o(0). Analogously, if Uj=1, the new 

estimate would become 0„. m._i and observed information would be 

u i-| u t| < _-| > u j — 1 

equal to Ju ii ,...,u ik _ 1 ,U j =l( 0 )- 

The maximum predicted expected information criterion selects as the kth item: 



i k = max|((1-pj(e U|i U | k _ ) Wuj r ...Uj k _ t ,Uj=0(8u| t U| |! _ 1 ,Uj=o) 



4 Pi i ( ®"i, ViN Vr u r ,( S “i k -r u i =1,: isRkl ' (,5) 



Note that the criterion in (13) not only evaluates the observed information measure 
associated with Uj=0 and Uj=1 but that re-evaluation of the measure for 
U| 1 ,...,Ui k is also implied. Further, since its two terms are evaluated at different 
values of 0, the expression in (13), though an expected value, is not an instance 
of Fisher’s information defined in (4). 
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Minimum Predicted Expected Variance 

If the information measures in (13) are replaced by the predicted posterior 
variances of 0 for Uj=0 and Uj=1, the following criterion is obtained: 



•k = max j {0 -Pj(0 u j u ik1 ) Var ( 0 l u i 1 u i k _ 1 - u j=°) 



\k-1 



+ Pj(0 )Var(0|uj 1 Uj R _ 1 ,Uj=1 ); je R k } . 



(14) 



Though the use of information measures for item selection is a well-established 
practice in IRT, the reciprocal of the information measure is only a large-sample 
approximation to the true variance of the posterior. Therefore, from a Bayesian 
point of view, the criterion in (14) should be preferred over the one in (13). As 
already noted, the same criterion was proposed as an alternative to (9) in Owen 
(1975). Reviews of the criterion can be found, for example, in Thissen and Mislevy 
(1990) and Weiss (1982). 



Maximum Predicted Posterior Expected Information 

In the criterion in (12), observed information is predicted for wrong and correct 
responses and the expectation is taken over these predictions. However, rather 
than evaluating observed information at predicted point estimates, its expectation 
over predicted posteriors could also be used. Let g(0|Uj ul ,,Uj=0) and 

g(0|uj.| Uj k ^ ,Uj=1 ) be the posterior of 0 after a wrong and correct response 

to item j, respectively. The following proposal is to select as the kth item: 



i k = maxj «t -Pj(0u ilt ;... U j k %_ 1 .Ui=0< e >9< e l“i, Vr U i’ 0)d9 
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+ p ' ( % vAi Vi 



, Uj = 1 (e)g(e|u il u ik 1 .Uj=i)de; je R k }. 



( 15 ) 



Note that this criterion combines the ideas underlying the criteria in (12)-(13): As in 

(12) , observed information is weighted by a posterior density, but at the, same time 
the criterion shares the idea of preposterior prediction with (13). 

Additional Criteria 

The above criteria do not constitute an exhaustive set of posterior-based 
criteria for item selection. For example, it is an easy step to generalize the criteria 
in (13)-(15) to predictions two or more items ahead. However, for larger item pools 
the combinatorial complexity of such criteria would quickly exceed the possibility of 
application to real-time adaptive testing but for small pools the idea seems 
attractive. Chang (1996) proposes to replace Fisher’s information by the Kullback- 
Leibler measure. The same substitution could easily be made for the criteria (12)- 

(13) and (15). Analogous to (15), the predicted probabilities in (14) could be 
replaced by expectations over predicted posterior distributions. Finally, maximum 
posterior variance between groups who score the item correct and incorrect was 
proposed as an item selection criterion by Wainer, Lewis, Kaplan, and Braswell 
(1992) (for an empirically comparison with the maximum-information criterion, see 
Schnipke and Green, 1995). 



Large-Sample Equivalence 

As is well known in Bayesian statistics, for k-» °° the posteriors 
g(6|Uj. Uj. =0) and g(0 |u ^ ,...,U j^=1 ) converge to a common (degenerate) 
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